Skip to content

feat: add local whisper.cpp voice transcription provider#157

Open
thereisnotime wants to merge 1 commit intoRichardAtCT:mainfrom
thereisnotime:feat/local-whisper-cpp-provider
Open

feat: add local whisper.cpp voice transcription provider#157
thereisnotime wants to merge 1 commit intoRichardAtCT:mainfrom
thereisnotime:feat/local-whisper-cpp-provider

Conversation

@thereisnotime
Copy link
Copy Markdown

Summary

  • Adds a third voice transcription provider (VOICE_PROVIDER=local) that uses whisper.cpp and ffmpeg for fully offline, API-key-free voice message transcription
  • New settings: WHISPER_CPP_BINARY_PATH and WHISPER_CPP_MODEL_PATH for configuring the local binary and model
  • Dedicated setup guide at docs/local-whisper-cpp.md with build-from-source instructions, model download links, and troubleshooting tips

Changes

  • src/bot/features/voice_handler.py — new _transcribe_local() pipeline: OGG→WAV (ffmpeg) → whisper.cpp binary
  • src/config/settings.pywhisper_cpp_binary_path, whisper_cpp_model_path fields + resolver properties
  • src/config/features.py — local provider skips API key check
  • src/bot/features/registry.py — updated key-availability logic
  • src/bot/handlers/message.py / src/bot/orchestrator.py — provider-aware error messages
  • docs/local-whisper-cpp.md — full build & setup guide
  • .env.example, CLAUDE.md, README.md, docs/configuration.md — documentation updates
  • Tests — full coverage for local provider (ffmpeg, binary, model, empty output, non-zero exit)

Test plan

  • Run existing test suite (pytest) — all tests should pass
  • Verify VOICE_PROVIDER=local with whisper.cpp installed transcribes a real voice message
  • Verify clear error messages when ffmpeg / whisper.cpp binary / model file is missing
  • Verify VOICE_PROVIDER=mistral and VOICE_PROVIDER=openai still work unchanged

🤖 Generated with Claude Code

@thereisnotime thereisnotime force-pushed the feat/local-whisper-cpp-provider branch from 9524828 to affa44f Compare March 20, 2026 00:38
@thereisnotime
Copy link
Copy Markdown
Author

Hey @RichardAtCT 👋 — would appreciate a review when you get a chance! This adds a local whisper.cpp voice transcription provider (no API keys needed).

@FridayOpenClawBot
Copy link
Copy Markdown

PR Review
Reviewed head: affa44f2a351a86e7bb4e3834cc8b6504b6299e0

Summary

  • Adds a third voice transcription provider (VOICE_PROVIDER=local) backed by whisper.cpp + ffmpeg — fully offline, no API key required
  • New settings WHISPER_CPP_BINARY_PATH / WHISPER_CPP_MODEL_PATH with sensible defaults and named-model resolution to ~/.cache/whisper-cpp/ggml-{name}.bin
  • Full unit test coverage for all error paths (ffmpeg missing, binary missing, model missing, empty output, non-zero exit)

What looks good

  • Clean provider abstraction — _transcribe_local is well-isolated and the existing Mistral/OpenAI paths are untouched
  • Tempfile cleanup in a finally block is correct; no risk of leaking WAV files even on failure
  • Error messages are actionable (include install commands and env var names) — good UX for a self-hosted setup

Issues / questions

  1. [Important] src/bot/features/voice_handler.py — Neither _convert_ogg_to_wav nor _run_whisper_cpp has a timeout. process.communicate() will block indefinitely if ffmpeg or whisper.cpp stalls. A near-20 MB file on a slow machine (or a model file that takes a long time to load the first time) could tie up the bot until the process exits or is killed externally. Consider asyncio.wait_for(process.communicate(), timeout=120) (or whatever the existing GIT_OPERATIONS_TIMEOUT pattern uses), raising a RuntimeError("transcription timed out") on expiry so the user gets feedback.

  2. [Nit] src/bot/features/voice_handler.py_resolve_whisper_binary validates via shutil.which(binary) but returns the original unresolved string (binary), discarding the fully-qualified path (resolved). This is fine for subprocess dispatch since PATH lookup happens again at exec time, but it means the validated path isn't reused — if PATH somehow changes between validation and execution, the nice error message is bypassed and you'd get a raw FileNotFoundError. Returning resolved from the method would make validation and execution consistent.

Verdict
⚠️ Merge after fixes — timeout on subprocess calls is the main gap; everything else is solid.

Friday, AI assistant to @RichardAtCT

@RichardAtCT
Copy link
Copy Markdown
Owner

Hey @RichardAtCT 👋 — would appreciate a review when you get a chance! This adds a local whisper.cpp voice transcription provider (no API keys needed).

Thanks - great idea. I actually use local whisper everywhere else so this makes sense!

Can you please fix the timeout flagged by @FridayOpenClawBot and the failing lint and then it is good to merge

Add a third voice provider option (VOICE_PROVIDER=local) that transcribes
Telegram voice messages entirely offline using whisper.cpp and ffmpeg.
No API keys or cloud services required.

- New local provider in voice_handler.py (OGG->WAV via ffmpeg, then whisper.cpp)
- Settings: WHISPER_CPP_BINARY_PATH, WHISPER_CPP_MODEL_PATH
- Feature flag, registry, and error messages updated for local provider
- Dedicated build/setup guide at docs/local-whisper-cpp.md
- Full test coverage for the local provider path
- Updated .env.example, CLAUDE.md, README.md, docs/configuration.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@thereisnotime thereisnotime force-pushed the feat/local-whisper-cpp-provider branch from affa44f to 5501304 Compare March 20, 2026 08:58
@thereisnotime
Copy link
Copy Markdown
Author

Thanks for the review @RichardAtCT and @FridayOpenClawBot! Both issues have been addressed:

  1. Timeouts — Added asyncio.wait_for(..., timeout=120) to both _convert_ogg_to_wav and _run_whisper_cpp. On timeout the subprocess is killed and a clear RuntimeError is raised.
  2. Resolved binary path_resolve_whisper_binary now caches and returns the fully-qualified path from shutil.which() so validation and execution are consistent.
  3. Lint — Ran black + isort on all affected files.

Also added docs/setup.md updates with the local provider configuration example and a link to the full build guide.

@RichardAtCT
Copy link
Copy Markdown
Owner

Good feature addition — offline transcription is genuinely useful and the architecture fits cleanly into the existing provider pattern. Several issues need addressing before merge.


🐛 Critical: No subprocess timeouts

Both ffmpeg and whisper.cpp are awaited with no timeout. A hung process (corrupted audio, slow disk, large model) will block the asyncio event loop indefinitely. Add asyncio.wait_for:

try:
    _, ffmpeg_stderr = await asyncio.wait_for(
        ffmpeg_proc.communicate(), timeout=30.0
    )
except asyncio.TimeoutError:
    ffmpeg_proc.kill()
    raise RuntimeError("ffmpeg timed out after 30s")

Same for the whisper subprocess. Timeout values should be configurable via settings (e.g. whisper_cpp_timeout: int = 120).


🔒 Minor: Temp file path construction

wav_path = ogg_path.replace(".ogg", ".wav") is fragile. Use:

wav_path = Path(ogg_path).with_suffix(".wav")

Also: WHISPER_CPP_BINARY_PATH is passed directly to create_subprocess_exec. No shell injection risk since create_subprocess_exec doesn't invoke a shell — but worth documenting explicitly. Consider validating the resolved path is executable at startup.


⚠️ Misconfiguration errors surface at request time, not startup

whisper_cpp_binary_path_resolved and whisper_cpp_model_path_resolved are computed on every request. A user who misconfigures the binary path won't find out until they send a voice message. Add a validate_local_provider() method called at bot startup (alongside existing provider validation) that calls both properties once and catches ValueError. Much better UX.


🔤 Type annotations

  • shutil is imported inside the property body — move to module-level imports per isort requirements
  • Field(None, ...) — the ... as second positional arg is unusual; prefer Field(default=None, description="...") for clarity and mypy friendliness
  • If whisper_cpp_timeout is added as recommended, ensure it's typed

🧪 Test coverage

Confirm tests cover:

  • ffmpeg failure (non-zero return code)
  • whisper.cpp failure
  • Empty transcription result
  • Temp file cleanup after each failure path
  • Timeout scenario (once timeout handling is added — this is a must-have test before merge)

Minor

  • Log the resolved binary/model path at INFO level on first use (structlog) — helps ops debugging
  • docs/local-whisper-cpp.md should note the ffmpeg system dependency explicitly

Summary: Clean implementation that fits the provider pattern well. The timeout issue is the main blocker — a hung whisper.cpp process will freeze the event loop in production. Everything else is polish.

Friday, AI assistant to @RichardAtCT (posted as @RichardAtCT — FridayOpenClawBot access pending)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants